A corpus of science journalism for analyzing writing quality

نویسندگان

  • Annie Louis
  • Ani Nenkova
چکیده

We introduce a corpus of science journalism articles, categorized in three levels of writing quality.1 The corpus fulfills a glaring need for realistic data on which applications concerned with predicting text quality can be developed and evaluated. In this article we describe how we identified, guided by the judgements of renowned journalists, samples of excellent, very good and typical writing. The first category comprises extraordinarily well-written pieces as identified by expert journalists. We expanded this set with other articles written by the authors of these excellent samples to form a set of very good writing samples. In addition, our corpus also comprises a larger set of typical journalistic writing. We provide details about the corpus and the text quality evaluations it can support. Our intention is to further extend the corpus with annotations of phenomena that reveal quantifiable differences between levels of writing quality. Here we introduce two such annotations that have promise for distinguishing amazing from typical writing: text generality/specificity and communicative goals. We present manual annotation experiments for specificity of text and also explore the feasibility of acquiring these annotations automatically. For communicative goals, we present an automatic clustering method to explore the possible set of communicative goals and develop guidelines for future manual annotations. We find that the annotation of general/specific nature on sentence level can be performed reasonably accurately fully automatically, while automatic annotations of communicative goals reveals salient characteristics of journalistic writing but does not align with categories we wish to annotate in future work. Still with the current automatic annotations, we provide evidence that features based on specificity and communicative goals are indeed predictive of writing quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

What Makes Writing Great? First Experiments on Article Quality Prediction in the Science Journalism Domain

Great writing is rare and highly admired. Readers seek out articles that are beautifully written, informative and entertaining. Yet information-access technologies lack capabilities for predicting article quality at this level. In this paper we present first experiments on article quality prediction in the science journalism domain. We introduce a corpus of great pieces of science journalism, a...

متن کامل

Predicting Text Quality for Scientific Articles

My work aims to build a system to automatically predict the writing quality in scientific articles from two genres—academic publications and science journalism. Our goal is to employ these predictions for article recommendation systems and to provide feedback during

متن کامل

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...

متن کامل

The Impact of Teaching Corpus-based Collocation on EFL Learners' Writing Ability

Abstract The present study explores the impact of corpus-based collocation instruction on intermediate Iranian EFL learners' writing ability. For this study, 84 Iranian learners, studying English as a foreign language in Bayan Institute, Iran, were selected and were randomly divided into two groups, experimental and control. Conventional methods of writing instruction were taught to the control...

متن کامل

The Impact of Teaching Corpus-based Collocation on EFL Learners' Writing Ability

Abstract The present study explores the impact of corpus-based collocation instruction on intermediate Iranian EFL learners' writing ability. For this study, 84 Iranian learners, studying English as a foreign language in Bayan Institute, Iran, were selected and were randomly divided into two groups, experimental and control. Conventional methods of writing instruction were taught to the control...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • D&D

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2013